Skip to content

Refactor parsing of numeric ASCII lists-of-lists#772

Open
RamogninoF wants to merge 1 commit intogerlero:mainfrom
RamogninoF:parsing-ascii-listlist
Open

Refactor parsing of numeric ASCII lists-of-lists#772
RamogninoF wants to merge 1 commit intogerlero:mainfrom
RamogninoF:parsing-ascii-listlist

Conversation

@RamogninoF
Copy link
Copy Markdown

Updated parsing of numeric ascii lists-of-lists to extend to arbitrary length of sub-lists (instead of hardcoded 3 and 4 length values).

This is required to parse faceLists stored in ASCII format such as below, which can have arbitrary number of vertices for poly meshes (often 5+). These would fallback to the default parser, leading to really long parsing time even for really small meshes.

/*--------------------------------*- C++ -*----------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  2112                                  |
|   \\  /    A nd           | Website:  www.openfoam.com                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
FoamFile
{
    version     2.0;
    format      ascii;
    arch        "LSB;label=32;scalar=64";
    class       faceList;
    location    "constant/polyMesh";
    object      faces;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //


80867
(
4(0 1 20587 20586)
// ...
4(21262 21263 21264 21265)
4(21266 21267 21268 21269)
6(21270 21271 21272 21273 21274 21275) // <------
4(21276 21277 21278 21279)
4(21280 21281 21282 21283)
5(21224 21223 21284 21285 21286) // <------
// ...
)


// ************************************************************************* //

…y length of sub-lists (instead of hardcoded 3 and 4 length values). This is required to parse faceLists in ascii format which can have arbitrary number of vertices for poly meshes (often 5+)
@RamogninoF
Copy link
Copy Markdown
Author

@gerlero I am not an expert and I wrote this with help of AI based on #727 where you previously helped me on a similar topic. I have added tests which seem to properly work, along with previous functionalities.

Let me know if this works correctly and does not break anything!

Thanks!!!

@gerlero
Copy link
Copy Markdown
Owner

gerlero commented Apr 10, 2026

@RamogninoF thanks! I'll take a look.

In principle this shouldn't be possible with regular expressions alone (which foamlib's parser uses to be fast enough), but I'll look at the code to see what it's doing

+ _SKIP.pattern
+ rb")?\)"
_SUB_LIST_LIKE = re.compile(
rb"(?:" + _POSSIBLE_INTEGER.pattern + rb")(?:" + _SKIP.pattern + rb")?\([^()]*?\)"
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RamogninoF problem to me is that this doesn't actually check that the sublist is well-formed. E.g. this will readily accept a list with a wrong count like 2 (1 2 3)...

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I have no background in parsing logics etc. and all this is far beyond my capabilities, I just hope this can be a useful starting point for you. At the current state parsing of meshes in ascii format is just straight impossible due to the time required for parsing (I gave up even on a 10k cells mesh after it was taking more then 10 minutes parsing the faces file). I would like to be able to support handling also ascii meshes rather then only binary

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple alternative could be just to add hardcoded parser for up to 10-vertex faces or so, which I think would be more then enough for most cases

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or what if the string is parsed twice via regex, one to retrieve the list and one to get the prefix marking it's length, and these quantities are compared to validate the parsed data before returning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants